80 research outputs found

    When humans and machines collaborate: Cross-lingual Label Editing in Wikidata

    Get PDF
    The quality and maintainability of a knowledge graph are determined by the process in which it is created. There are different approaches to such processes; extraction or conversion of available data in the web (automated extraction of knowledge such as DBpedia from Wikipedia), community-created knowledge graphs, often by a group of experts, and hybrid approaches where humans maintain the knowledge graph alongside bots. We focus in this work on the hybrid approach of human edited knowledge graphs supported by automated tools. In particular, we analyse the editing of natural language data, i.e. labels. Labels are the entry point for humans to understand the information, and therefore need to be carefully maintained. We take a step toward the understanding of collaborative editing of humans and automated tools across languages in a knowledge graph. We use Wikidata as it has a large and active community of humans and bots working together covering over 300 languages. In this work, we analyse the different editor groups and how they interact with the different language data to understand the provenance of the current label data

    An investigation of techniques that aim to improve the quality of labels provided by the crowd

    No full text
    The 2013 MediaEval Crowdsourcing task looked at the problem of working with noisy crowdsourced annotations of image data. The aim of the task was to investigate possible techniques for estimating the true labels of an image by using the set of noisy crowdsourced labels, and possibly any content and metadata from the image itself. For the runs in this paper, we’ve applied a shotgun approach and tried a number of existing techniques, which include generative probabilistic models and further crowdsourcing

    DATA:SEARCH'18 - Searching data on the web

    Get PDF

    Human Computation and Convergence

    Full text link
    Humans are the most effective integrators and producers of information, directly and through the use of information-processing inventions. As these inventions become increasingly sophisticated, the substantive role of humans in processing information will tend toward capabilities that derive from our most complex cognitive processes, e.g., abstraction, creativity, and applied world knowledge. Through the advancement of human computation - methods that leverage the respective strengths of humans and machines in distributed information-processing systems - formerly discrete processes will combine synergistically into increasingly integrated and complex information processing systems. These new, collective systems will exhibit an unprecedented degree of predictive accuracy in modeling physical and techno-social processes, and may ultimately coalesce into a single unified predictive organism, with the capacity to address societies most wicked problems and achieve planetary homeostasis.Comment: Pre-publication draft of chapter. 24 pages, 3 figures; added references to page 1 and 3, and corrected typ

    A Model for Language Annotations on the Web

    Get PDF
    Several annotation models have been proposed to enable a multilingual Semantic Web. Such models hone in on the word and its morphology and assume the language tag and URI comes from external resources. These resources, such as ISO 639 and Glottolog, have limited coverage of the world's languages and have a very limited thesaurus-like structure at best, which hampers language annotation, hence constraining research in Digital Humanities and other fields. To resolve this `outsourced' task of the current models, we developed a model for representing information about languages, the \textbf{Mo}del for \textbf{L}anguage \textbf{A}nnotation (\langmod{}), such that basic language information can be recorded consistently and therewith queried and analyzed as well. This includes the various types of languages, families, and the relations among them. \langmod{} is formalized in OWL so that it can integrate with Linguistic Linked Data resources. Sufficient coverage of \langmod{} is demonstrated with the use case of French

    An architecture for the autonomic curation of crowdsourced knowledge

    Get PDF
    Human knowledge curators are intrinsically better than their digital counterparts at providing relevant answers to queries. That is mainly due to the fact that an experienced biological brain will account for relevant community expertise as well as exploit the underlying connections between knowledge pieces when offering suggestions pertinent to a specific question, whereas most automated database managers will not. We address this problem by proposing an architecture for the autonomic curation of crowdsourced knowledge, that is underpinned by semantic technologies. The architecture is instantiated in the career data domain, thus yielding Aviator, a collaborative platform capable of producing complete, intuitive and relevant answers to career related queries, in a time effective manner. In addition to providing numeric and use case based evidence to support these research claims, this extended work also contains a detailed architectural analysis of Aviator to outline its suitability for automatically curating knowledge to a high standard of quality

    Data Work in a Knowledge-Broker Organization: How Cross-Organizational Data Maintenance shapes Human Data Interactions.

    Get PDF
    • …
    corecore